Sequencing and Raw Sequence Data Quality Control    ◾    25

analysis, we may need to filter the reads that have low-quality bases or to trim the ends

of the reads beginning from the 34th base. Figure 1.15 shows a per base sequence quality

graph without warning.

1.5.3  Per Tile Sequence Quality

The per tile sequence quality is represented by a heatmap graph that is available only if an

Illumina sequencer was used and the reads in the FASTQ file retain the original identi-

fiers including the IDs of the flow cell tiles on which reads were sequenced. Figure 1.16

shows the first two records of the FASTQ file “SRR576933.fastq”. Notice that the identifier

line of each record contains the sequence ID, the flow cell ID, lane number, tile number,

x-coordinate and y-coordinate of the tile, and read length.

In the graph, the base position indexes are plotted in the x-axis against the physical posi-

tions on the flow cells (tile numbers) in the y-axis. The base quality is represented by a color

scale from blue (cold) to red (hot). The blue color indicates that the quality of the base from

the tile is at or above the average for the base in the run. The red color indicates the qual-

ity for the base in that tile is worse than the quality for the same base from the other tiles.

The graph provides an easy way to track the average quality scores from each tile across all

FIGURE 1.14  Per base sequence quality with warning.